Mining biosynthetic gene clusters in Paenibacillus genomes to discover novel antibiotics

Background Bacterial antimicrobial resistance poses a severe threat to humanity, necessitating the urgent development of new antibiotics. Recent advances in genome sequencing offer new avenues for antibiotic discovery. Paenibacillus genomes encompass a considerable array of antibiotic biosynthetic gene clusters (BGCs), rendering these species as good candidates for genome-driven novel antibiotic exploration. Nevertheless, BGCs within Paenibacillus genomes have not been extensively studied. Results We conducted an analysis of 554 Paenibacillus genome sequences, sourced from the National Center for Biotechnology Information database, with a focused investigation involving 89 of these genomes via antiSMASH. Our analysis unearthed a total of 848 BGCs, of which 716 (84.4%) were classified as unknown. From the initial pool of 554 Paenibacillus strains, we selected 26 available in culture collections for an in-depth evaluation. Genomic scrutiny of these selected strains unveiled 255 BGCs, encoding non-ribosomal peptide synthetases, polyketide synthases, and bacteriocins, with 221 (86.7%) classified as unknown. Among these strains, 20 exhibited antimicrobial activity against the gram-positive bacterium Micrococcus luteus, yet only six strains displayed activity against the gram-negative bacterium Escherichia coli. We proceeded to focus on Paenibacillus brasilensis, which featured five new BGCs for further investigation. To facilitate detailed characterization, we constructed a mutant in which a single BGC encoding a novel antibiotic was activated while simultaneously inactivating multiple BGCs using a cytosine base editor (CBE). The novel antibiotic was found to be localized to the cell wall and demonstrated activity against both gram-positive bacteria and fungi. The chemical structure of the new antibiotic was elucidated on the basis of ESIMS, 1D and 2D NMR spectroscopic data. The novel compound, with a molecular weight of 926, was named bracidin. Conclusions This study outcome highlights the potential of Paenibacillus species as valuable sources for novel antibiotics. In addition, CBE-mediated dereplication of antibiotics proved to be a rapid and efficient method for characterizing novel antibiotics from Paenibacillus species, suggesting that it will greatly accelerate the genome-based development of new antibiotics. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-024-03375-5.


Background
A recent survey, encompassing 471 million individual records/isolates from 204 countries and territories, revealed that bacterial antimicrobial resistance (AMR) caused 4.95 million deaths [1].AMR poses a severe threat to humanity, necessitating the urgent development of new antibiotics.Traditional methods for antibiotic discovery involving activity screening of soil-derived microorganisms have been abandoned owing to the rediscovery of known compounds [2].Alternative methods, such as developing synthetic antibiotics through high-throughput screening and rational drug design, have limitations, mainly related to inadequate penetration of these antimicrobial agents through the bacterial cell wall and their narrow antimicrobial spectrum [2].Recently, machine learning methods have been introduced to expedite the exploration of chemical libraries for discovering novel antibiotics [3].Screening for new antibiotics from previously inaccessible or underexplored microbial sources has been successful [4,5].However, this approach remains challenging due to the presence of a large number of previously discovered compounds [6].Dereplication, the elimination of known antibiotics in microbial extracts, is a laborious and time-consuming process.Advanced techniques such as mass spectrometry and nuclear magnetic resonance-based metabolomics have been introduced to assist in dereplication [7].Nonetheless, they often require pure fractions, rendering them unsuitable for initial screening.Thus, a new platform is needed for antibiotic discovery.
Recent advances in genome sequencing have unveiled a wealth of untapped biosynthetic gene clusters (BGCs) in microbial genomes, offering new avenues for antibiotic discovery [8,9].Actinobacteria, renowned for their contributions to traditional antibiotics, have been the focus of large-scale genome mining efforts due to their diverse BGCs [9,10].Additionally, bacterial species in the order Bacillales have garnered attention as a resource for novel antibiotics [11][12][13].They produce antibiotics structurally distinct from Actinobacteria.Among the Bacillales members, antibiotic studies have focused on the family Bacillaceae, whereas strains belonging to the family Paenibacillaceae have been insufficiently explored.Our analysis of Paenibacillus genomes from the National Center for Biotechnology Information (NCBI) database revealed that 84.4% of the BGCs were uncharacterized.The result underscores the potential of Paenibacillus species as sources for discovering new antibiotics.Moreover, our research demonstrates that new antibiotics can be efficiently and rapidly characterized via cytosine base editor (CBE)-mediated genetic dereplication in a selected Paenibacillus strain.

Strains and culture conditions
The strains utilized in this study are detailed in Supplementary Table S1.For general cloning, Escherichia coli strain MC1061 [14] was employed.Paenibacillus strains were sourced from the Korean Agricultural Culture Collection (KACC) and the Korean Collection for Type Cultures (KCTC).Escherichia coli and Bacillus subtilis strains were cultured in Luria-Bertani medium (LB; Difco, Detroit, MI, USA) at 37 °C.Paenibacillus strains were cultivated in tryptic soy broth (TSB; Difco) or tryptic soy agar (TSA) at 30 °C.When necessary, the medium was supplemented with chloramphenicol (7.5 µg/mL for Paenibacillus brasilensis) or ampicillin (100 µg/mL).P. brasilensis transformation was conducted as previously described [15].Indicator strains from the American Type Culture Collection (ATCC), KCTC, or KACC for antimicrobial activity assays were cultured as follows: Bacillus cereus was grown in LB broth or agar at 30 °C; E. coli and Acinetobacter baumannii were grown in LB broth or agar at 37 °C; Micrococcus luteus was grown in TSB or TSA at 30 °C; Pseudomonas aeruginosa was grown in TSB or TSA at 37 °C; Pythium ultimum, Fusarium graminearum, and Rhizoctonia solani were grown in potato dextrose agar (PDA; Difco) at 25 °C.

Paenibacillus genome sequence collection and BGC analysis
A total of 554 Paenibacillus genome sequences were acquired from the NCBI Genome Sequence Database.BGCs for secondary metabolites were predicted using antiSMASH v6.1.1 [16].Distinguishing existing and new BGCs was based on BGC type, predicted domain, and knownClusterBlast confirmed using AntiSMASH.
single-guide RNA (sgRNA) cassettes, respectively.The amplified sgRNA cassettes and backbone plasmid pMGoldi-sCBE4 [15] were assembled using a modified Golden Gate assembly protocol [17] to construct pMisCBE4-RB5.P. brasilensis was transformed with pMisCBE4-RB5 using the previously described conjugation method, the modified integrative and conjugative element (MICE) [15].Randomly selected transformants were analyzed using DNA sequencing to confirm the mutations.Curing of plasmid pMisCBE4-RB5 was performed using a previously described method [17] to generate P. brasilensis RB5.The RB5 strain was confirmed to harbor the relevant mutations via DNA sequencing.

Construction of P. brasilensis RB5d5
The remaining BGC5 was deleted from the RB5 strain to construct the RB5d5 strain using a Bacillus integrative plasmid combining a synthetic gene circuit (BIPS) system reported previously [18].To construct a plasmid for deleting BGC5, two homologous arm fragments corresponding to the upstream and downstream regions of the target gene were amplified from the chromosome of P. brasilensis using the primer sets 1G-BRB5-FF/1G-BRB5-FR and 3GX-BRB5-BF/3GX-BRB5-BR, respectively.The chloramphenicol resistance gene (cat) under the control of the P spac promoter (P spac -cat fragment) was amplified from plasmid pA-xylR2 [19] using the primer set Pspac-F/cat-R.The homologous arm and P spac -cat fragments were cloned into pSGC4iN [18] using a modified Golden Gate assembly protocol to construct pSGC4iN-RB5d.After introducing pSGC4iN-RB5d into the conjugation donor MICEaRep [18], the plasmid was transformed into the recipient P. brasilensis RB5 using the MICE method to construct P. brasilensis RB5d5.The BGC5 mutation in RB5d5 was confirmed via sequencing.

Antimicrobial activity assay
The indicator strains were cultured for 16 h in the appropriate medium at 30 °C for B. cereus and M. luteus and at 37 °C for the other strains.Antimicrobial assay plates were prepared using LB or TSA agar containing a 1% culture of each indicator strain.Paenibacillus strains were grown in 2 mL of TSB at 30 °C for 16 h.After adjusting the cultures to an optical density at 600 nm (OD 600 ) of 2 in TSB medium, 5 µL of the cultures was spotted on each bioassay plate followed by incubation for 24 h (48 h for M. luteus) at 30 °C for B. cereus and M. luteus and 37 °C for the other strains.Subsequently, the diameters of the inhibition zones (mm) were measured.For methanol extraction, Paenibacillus strains were cultured in TSB at 30 °C for 24 h.Methanol extraction and antimicrobial activity assays of the extracts were performed using a previously reported method [20].

Antifungal activity assay
All fungi were grown on PDA plates at 30 °C for 3 days.Subsequently, 5 mm plugs of each fungus were placed in the center of a new PDA plate.Paenibacillus strains were cultured in 2 mL of TSB at 30 °C for 16 h.After adjusting the cultures to OD 600 of 2 in TSB medium, 5 µL of the cultures was spotted 2 cm away from the plug.The plates were incubated at 30 °C for 3 days.

Purification and structure determination of the novel compound
P. brasilensis RB5 was grown in a 250 mL Erlenmeyer flask with 50 mL of tryptic soy broth (TSB; Difco) seed culture medium for 24 h at 30 °C on a rotary shaker set at 180 rpm.For larger-scale cultivation, 1% of the seed culture was transferred into 40 baffled 1,000 mL Erlenmeyer flasks containing 250 mL of TSB broth.These were cultured for 48 h at 30 °C on a rotary shaker at 180 rpm.The resulting residue was partitioned with ethyl acetate three times and then evaporated to remove the solvent.The crude extract was fractionated using reversed-phase C 18 vacuum column chromatography with a stepwise solvent system of methanol and water (20:80 to 100:0 v/v, each in 1-liter volumes).The 80% fraction (120 mg) was further purified by reversed-phase HPLC (Cosmosil semipreparative C 18 , 30% acetonitrile, 3 mL/min, UV detection at 210, 265 nm) to obtain the desired compound.NMR spectra were recorded on Bruker AVANCE HD 900 NMR spectrometers at the Korea Basic Science Institute (KBSI) in Ochang, South Korea.Chemical shifts were referenced to the residual solvent signal (DMSOd 6 ; δ H 2.50, δ C 39.51).

Mining of BGCs in Paenibacillus genomes
In the order Bacillales, the exploration of BGCs within the family Paenibacillaceae has been notably limited compared to the Bacillaceae family.To address this, we delved into BGCs within Paenibacillus species, representative of the Paenibacillaceae family.We obtained 554 Paenibacillus genomes from NCBI (February 4, 2020) and classified them into four genome levels based on sequencing data quality and assembly level: complete, chromosome, scaffold, and contig levels, accounting for 81, 8, 197, and 268 sequences, respectively.Further analysis was conducted on 89 genomes at the complete and chromosome levels (Supplementary Table S4).Using antiSMASH, we analyzed the 89 genomes and detected a total of 848 BGCs, with an average of 9.5 BGCs per strain (Fig. 1).Of the 848 BGCs, 716 (84.4%) were classified as unknown and 671 (79.1%) were associated with the production of non-ribosomal peptide synthetases (NRPS), polyketide synthases (PKS), and bacteriocins.Of the 554 Paenibacillus strains whose genomes have been deposited at NCBI, we selected 26 strains from culture collections for more in-depth investigation (Table 1).

BGC analysis and antimicrobial activity evaluation of selected Paenibacillus strains
The genomic analyses of these 26 selected strains using antiSMASH revealed 312 BGCs, of which 255 consisted of NRPSs, PKSs, and potentially antibacterial bacteriocins (Table 2).Of the 255 BGCs, 221 (86.7%) were identified as unknown.The remaining 57 BGCs comprised ladderanes, siderophores, terpenes, resorcinol, beta-lactones, phosphonates, oligosaccharides, and ectoine.Strains containing five or more unknown BGCs were P. brasilensis KACC 2).Of the 26 strains, 20 showed antimicrobial activity against M. luteus, whereas only six showed antimicrobial activity against E. coli.Paenibacillus assamensis, P. brasilensis, P. ehimensis, Paenibacillus nuruki, and P. tianmuensis exhibited strong antimicrobial Paenibacillus thiaminolyticus, showed no antimicrobial activity, suggesting that their BGCs may be silent under standard laboratory conditions as previously reported [21] or the antibiotics produced may be outside the scope of the antibacterial assay in this study.
Of the 26 strains selected, the genome sequences of seven were determined completely.Of the seven strains, P. brasilensis was selected for further analysis to discover novel antibiotics derived from Paenibacillus species because of its strong antimicrobial activity against M. luteus.

Characterization of a new antibiotic from P. brasilensis
AntiSMASH analysis revealed the presence of 14 potential BGCs in P. brasilensis (Supplementary Fig. S1), of which BGC1, BGC3, BGC5, and BGC11 were classified as new BGCs based on gene structure and annotation.In addition, BGC1 was subclassified into BGC1a and BGC1b because it appeared to comprise two BGCs.Consequently, it was suggested that P. brasilensis contained five new BGCs (Fig. 2, A).Among them, BGC5  [20].Therefore, the BGC5 antibiotic was considered a novel antibiotic because its size and amino acid composition are different from those of fusaricidin B. Additionally, AntiSMASH annotated BGC11 as a BGC encoding fusaricidin B (Supplementary Figure S1).BGC11 comprises two separate BGCs: the first is similar to the fusaricidin BGC, and the second is a novel BGC.Upon comparing the first BGC with the fusaricidin BGC from P. polymyxa E681, it was found that the first BGC is considered a pseudogene, with 55% of the fusaricidin gene deleted.Consequently, the first BGC was excluded from the antibiotic BGC analysis.
To ensure accurate characterization, the BGC5 antibiotic required dereplication, preventing the influence of other antibiotics.We accomplished this by constructing the P. brasilensis RB5 strain, in which four BGCs were disabled to leave only BGC5 active, utilizing a CBE-mediated antibiotic dereplication method [17].Subsequently, the remaining BGC5 in RB5 was deleted using the BIPS Table 1 Selected Paenibacillus strains available in culture collections for an in-depth evaluation system [18] to construct the RB5d5 strain in which all five BGCs were inactivated.The BIPS system minimizes the potential for secondary mutations on the genome, thus safeguarding genetic integrity through traditional double-crossover recombination.We assessed the antimicrobial activities of wild-type, RB5, and RB5d5 P. brasilensis against gram-positive bacteria, gram-negative bacteria, and fungi.The wild-type strain demonstrated antimicrobial activity against gram-positive bacterium M. luteus (Fig. 3A) and fungi P. ultimum, R. solani, and F. graminearum (Fig. 3C).However, it proved ineffective against the gram-positive bacterium B. cereus and gramnegative strains E. coli, A. baumannii, and P. aeruginosa.The RB5 strain exhibited diminished antibacterial activity against M. luteus, while the RB5d5 strain displayed no antimicrobial activity.These findings emphasize the role of the BGC5 antibiotic in combating M. luteus and suggest that other antibiotics may contribute to its activity.As for antifungal activity, the RB5 strain showed performance similar to the wild-type strain, whereas the RB5d5 strain demonstrated reduced efficacy.Consequently, the antifungal activity of P. brasilensis was mainly dependent on the BGC5 antibiotic, although residual antifungal activity in RB5d5 suggests the presence of other metabolites with similar properties.In summary, BCG5 demonstrated antimicrobial activity against gram positive bacteria and fungi and shared similarities with fusaricidin antibiotics.
For the further characterization of the BGC5 antibiotics, cell pellets and culture supernatants of the RB5 were extracted with methanol and the antimicrobial activities of the extracts were compared to those of the RB5d5 strain.Antimicrobial activity was demonstrated only in the extract from RB5 (Fig. 3B), indicating that BGC5 encodes an antibiotic active against M. luteus.LC/ MS analysis supported this result, in that the peak corresponding to the BGC5 antibiotic disappeared in the RB5d5 extract (Fig. 4).Interestingly, antimicrobial activity was observed in the extract from the cell pellet of RB5, indicating that the BGC5 antibiotic was bound to the cell wall, similar to fusaricidin antibiotics [20].Although the size and amino acid composition of the BGC5 antibiotic a BGC analysis was performed using antiSMASH.The known and unknown BGCs are associated with non-ribosomal peptide synthetase, polyketide synthase, and bacteriocin production.Others include ladderane, siderophore, terpene, resorcinol, betalactone, phosphonate, oligosaccharide, and ectoine b -. no inhibition; +, < 1 mm zone; ++, 1-3 mm zone; +++ > 3 mm zone of growth inhibition were different to that of fusaricidin B, both antibiotics shared similar properties in terms of cellular location and antimicrobial spectrum.We determined the chemical structure of the BGC5 antibiotic using NMR spectroscopy (Supplementary Fig. S2 and Fig. S3).The structure was found to be a novel cyclic lipopeptide made up of seven amino acids (Fig. 5

Discussion
In classical methods, new antibiotics from natural resources can be identified using multiple processes, including the isolation of pure compounds and subsequent structural analysis.The processes are labor-intensive and time-consuming and are mostly unsuccessful due to the large number of known compounds.As sequencing costs have recently decreased dramatically, genome sequence data is accumulating exponentially [22].Advanced technologies now facilitate the culture of previously unculturable bacteria, which greatly expands the pool of bacterial genomes [23].In silico genomic analysis tools can mine BGCs quickly and in large quantities, making them a powerful engine in the development of new antibiotics.The most attractive advantage of the tools is that they can predict the novelty of antibiotics without laborious chemical purification and characterization processes [16].Several BGC databases, such as antiSMASH, ClusterMine360 [24], and IMG-ABC [25] are available to users.Therefore, genome-based antibiotic development strategies can be a useful weapon to combat the threat of AMR by the continuous supply of new antibiotics.
Antibiotics from Paenibacillus species, especially P. polymyxa E681, which has strong antibacterial and antifungal activities, have been well-studied.E681 produces at least six antibiotics, including polymyxin, fusaricidin, tridecaptin, paenilipoheptin, paenilan, and bacillaenelike antibiotics [26].Among these, polymyxin is used as a last-resort antibiotic for treating infections caused One of the most formidable challenges in antibiotic discovery is dealing with a vast collection of known compounds [2], necessitating the process of antibiotic dereplication to distinguish new compounds.In addition, the genome-based discovery of novel antibiotics requires dereplication because most bacteria contain multiple BGCs and likely produce antibiotic mixtures.Our genome analysis indicated that Paenibacillus strains typically contained an average of 9.5 BGCs per strain.Although traditional dereplication methods have improved with advanced analytical techniques [7]; they still require purification steps to obtain pure fractions, making them unsuitable as initial screening methods.Recently, an efficient genetic dereplication method was developed using the CBE system, which can construct a single antibiotic-producing strain by simultaneously inactivating multiple BGCs [17].However, CBE-based genetic dereplication must overcome the transformation barrier of the wild-type strain because it is difficult to transform foreign DNA into wild-type strains.Recently, it was reported that the broad-host-range conjugation system, MICE, facilitates the efficient use of genomeediting tools in various wild-type Bacillus strains [15].Here, we applied the CBE system to P. brasilensis using the MICE system, simultaneously inactivated the four BGCs to construct a single BGC5 antibiotic-producing strain and investigated the antibacterial activity of this new antibiotic.Therefore, we demonstrated that genetic dereplication via the MICE and CBE systems is also possible in Paenibacillus species, which can facilitate subsequent processes such as activity-based screening, purification, and characterization of antibiotics.
Despite the wealth of information unlocked by largescale genomic analyses, many BGCs remain inactive or silent under standard laboratory conditions [21].As our results showed, six Paenibacillus strains, including P. alvei, P. donghaensis, P. fonticola, P. harenae, P. pinihumi, and P. thiaminolyticus, possessed one or more BGCs; however, they did not show antibacterial activity against M. luteus or E. coli.It is possible that BGCs in these six strains as well as in many other Paenibacillus strains are cryptic.Therefore, the activation of silent BGCs is important for the discovery of new antibiotics.Several methods have been developed to activate silent BGCs, including the heterologous expression of BGCs in surrogate hosts, homologous expression under different fermentation conditions, screening of elicitors, co-cultivation, ribosome and RNA polymerase engineering, regulatory gene activation, histone modification, and metabolic remodeling for precursor supply [29].The heterologous expression of BGCs in surrogate hosts is limited by difficulties faced in cloning BGCs owing to their large size, incompatible regulatory systems, and absence of biosynthetic precursors or essential enzymes.Homologous expression, inducer screening, and co-culture methods can facilitate the production of antibiotic mixtures, suggesting that antibiotic dereplication is still required to characterize specific compounds.Ribosome and RNA polymerase engineering, regulatory gene activation, histone modification, and metabolic remodeling methods require both antibiotic dereplication and genome editing in wild-type strains.In this study, we constructed a single antibiotic-producing strain using MICE-and CBE-mediated genetic dereplication of antibiotics, which facilitated the characterization of a new antibiotic.Therefore, the construction of a single antibiotic-producing strain, as performed in this study, prior to the activation of silent BGCs, will greatly accelerate the genome-based development of new antibiotics.

Conclusions
Our analysis of 89 Paenibacillus genomes revealed 848 BGCs, with a significant proportion being uncharacterized.This highlights the potential of Paenibacillus species for genome-based antibiotic discovery.Moreover, we successfully constructed a single antibiotic-producing P. brasilensis strain using the MICE and CBE systems, which greatly expedited the characterization of the new antibiotic.This approach is not exclusive to Paenibacillus and can be extended to other genera within the Bacillaceae and Paenibacillaceae families.This strategy represents a promising avenue for antibiotic discovery, including the identification of new BGCs via genome analysis, construction of single antibiotic-producing strains using MICE and CBE systems, and subsequent characterization of novel antibiotics; it can be applied to unexplored strains of other genera belonging to the families Bacillaceae and Paenibacillaceae as well as other members of the order Bacillales.

Fig. 2 (
Fig. 2 (A) BGCs in the genome of Paenibacillus brasilensis.The numbers represent the size (in base pairs) of the gene.(B) Domain organization of a nonribosomal peptide synthetase encoded by BGC5.Modules 2, 4, and 6 contain epimerization domains that convert the corresponding amino acid to its D-form.BGC, biosynthetic gene cluster

Fig. 3 (
Fig. 3 (A) Antimicrobial activities of Paenibacillus brasilensis and its mutants against Micrococcus luteus.(B) Antimicrobial activities of the methanolextracted cell pellets and culture supernatants of RB5 and RB5d5 against M. luteus.(C) Antifungal activities of P. brasilensis and its mutants against fungi.Strain RB5 contains four inactivated BGCs, leaving only BGC5 functional.In RB5d5, all five BGCs are inactivated.BGC, biosynthetic gene cluster

Fig. 4
Fig. 4 Liquid chromatography analysis of cell pellet methanol extracts of RB5 (A) and RB5d5 (B) with mass spectrometry data at retention times of 8.09 (C), 8.25 (D), and 8.43 min (E), respectively

Table 2
Biosynthetic gene cluster (BGC) analysis and antimicrobial activity of selected Paenibacillus strains